Ardour3 Video Integration 

film-soundtracks on GNU/Linux 


Robin Gareus 


CiTu - Pargraphe Research Group 
University Paris 8 - Hypermedia Department 
robin@gareus.org 


April, 2012 


Introduction Problem Analysis API Video Server Client Implementation 

Outline of the talk 


Introduction 
Problem Analysis 
API 

Video Server 

Client Implementation 

Coda 


Coda 


Robin Gareus (CiTu) 


Ardour3 Video Integration 


April 2012 


2/21 


Introduction Problem Analysis API Video Server 

Introduction 


» Soundtrack composition, 
arrangement and 
production is rather 
young discipline 


Robin Gareus (CiTu) 


Ardour3 Video Integration 


April 2012 


3/21 


Introduction 


Problem Analysis 


API 


Video Server 


Client Implementation 


Introduction 


o Soundtrack composition, ® about 70 years old. 

arrangement and 
production is rather 
young discipline 


Robin Gareus (CiTu) 


Ardour3 Video Integration 


April 


Introduction 


Problem Analysis 


API 


Video Server 


Client Implementation 


Introduction 


» Soundtrack composition, 
arrangement and 
production is rather 
young discipline 


® about 70 years old. 

® rapidly changing technology 


Robin Gareus (CiTu) 


Ardour3 Video Integration 


April 2012 


Introduction 


Problem Analysis 


API 


Video Server 


Client Implementation 


Introduction 


» Soundtrack composition, 
arrangement and 
production is rather 
young discipline 

» Nevertheless, standard 
procedures have evolved 


Robin Gareus (CiTu) 


Ardour3 Video Integration 


April 


Introduction 


Problem Analysis 


API 


Video Server 


Client Implementation 


Introduction 


» Soundtrack composition, 
arrangement and 
production is rather 
young discipline 

» Nevertheless, standard 
procedures have evolved 

9 Audio and Video remain 
separated for historically 
and practically reasons. 


Robin Gareus (CiTu) 


Ardour3 Video Integration 


April 


Introduction 


Problem Analysis 


API 


Video Server 


Client Implementation 


Introduction 


9 Soundtrack composition, 
arrangement and 
production is rather 
young discipline 

» Nevertheless, standard 
procedures have evolved 

9 Audio and Video remain 
separated for historically 
and practically reasons. 


9 different qualifications and 
expertise is needed 


Robin Gareus (CiTu) 


Ardour3 Video Integration 


April 2012 


Introduction 


Problem Analysis 


API 


Video Server 


Client Implementation 


Introduction 


9 Soundtrack composition, 
arrangement and 
production is rather 
young discipline 

» Nevertheless, standard 
procedures have evolved 

9 Audio and Video remain 
separated for historically 
and practically reasons. 


9 different qualifications and 
expertise is needed 
9 too much gear on the camera 


Robin Gareus (CiTu) 


Ardour3 Video Integration 


April 2012 


Introduction 


Problem Analysis 


API 


Video Server 


Client Implementation 


Introduction 


9 Soundtrack composition, 
arrangement and 
production is rather 
young discipline 

» Nevertheless, standard 
procedures have evolved 

9 Audio and Video remain 
separated for historically 
and practically reasons. 


9 different qualifications and 
expertise is needed 
9 too much gear on the camera 
9 job-definition by unions 


Robin Gareus (CiTu) 


Ardour3 Video Integration 


April 2012 


Introduction 


Problem Analysis 


API 


Video Server 


Client Implementation 


Introduction 
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arrangement and 
production is rather 
young discipline 

o Nevertheless, standard 
procedures have evolved 

9 Audio and Video remain 
separated for historically 
and practically reasons. 


9 different qualifications and 
expertise is needed 
9 too much gear on the camera 
9 job-definition by unions 
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but: overdubs, translations) 

a sound-effects (Foley, Sound-scapes, Spot-sounds - tight sync) 
a film-music 

One goal is to synchronize dramatic events happening on screen with 
musical events in the score. With only a few exceptions - namely song or 
dance scenes - music composition and sound-design usually takes place 
after recording and editing the video. 
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» analog: audio-visual cues (slate) 

» digital: time-code (most commonly SMPTE produced by the camera). 
« digital: EDL, AAF, MXF, BWF, OMF+OMFI,... 

Technical skills and details involved can become quite complex. Neither 
composers nor sound-designers do want to concern themselves with that 
task. 
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a easy-to-use 
a professional 

a workflow for film-sound production 
a free-software. 

More specifically: Integration of video-elements into the Ardour Digital 
Audio Workstation. The resulting interface must not be limited to the 
software at hand (Ardour, Xjadeo, icsd) but allow for further adaption or 
interoperability. 
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® Client-Server model 

a external video monitoring app. 
a external video-decoder application 
o "Video-server” 

a video-frame-cache 
a video session management 

a modular design - pluggable video decoder back-end 
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All requests from GUI to backend are done via HTTP 

9 HTTP is a well supported protocol with existing infrastructure 
® established proxy and load-balancing systems 
o persistent HTTP connections 
9 web-interface 

9 ..but: no out-of-band communication 
Requests handlers: 

9 info (file and/or session information) 

9 image (video-frame, thumbnail) 

9 stream (export, render) 

9 admin (cache-flush, status,...) 
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» Time: video-frame count (duration, offset, time) 

9 Geometry: effective image size (incl. pixel-aspect and display-ascpect) 
9 Framerate: ratio 

9 Image: various-formats (raw RGB[A], PNG, JPG, YUV..) 

9 Text: serialized key-value store in various formats (XML, JSON,..) 

The reply-format is chosen by the file-extension and may be overridden 
using request parameters. 
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Requesting a single video-frame 

a file-name or session-name (required) 

a frame the frame-number (starting at zero for the first frame - 
required). 

a width, height (optional) 
o reply-format (optional, implicit) 
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» HTTP server in POSIX-C 
« a video-frame cache 
® multiple decoder instances: 
o parallel decoding 

a efficiency - keep state, key-frame continuity 
» tested on GNU/Linux, OSX and Win32 
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® Cache hits are dominated by transfer time (few ms - depending on 
image-size and network) 

® Decoder latency is highly dependent on the geometry of the movie, 
codec, CPU and I/O. 

® On slower CPUs (< 1.6 GHz Intel) a full HDVideo can be decoded 
and scaled at 25 fps using mjpeg codec width only intra-frames at the 
cost of high I/O. 

® Faster CPUs can shift the load towards the CPU. 

® Parallelizing requests increases latency to up to a few hundred 
milliseconds. 

® This is intended behavior: image for a whole view-point page will 
arrive simultaneously. 
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decoding mpeg4and transfering 80x60px rgb frames 



percentage of requests served within certain time 


decoding mpeg4 and transfering 768x576px rgb frames 



percentage of requests served within certain time 


Figure: latency for decoding 80x60 
thumbnails for 1,2 and 8 parallel 
requests. 


Figure: decoder and request latency for 
PAL (768x576px) video. 
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Changes to Ardour3: 

o Video-timeline display (unique “ruler”) 
9 HTTP interaction with the video-server 
9 Xjadeo remote control (video-monitor) 
9 ffmpeg CLI interaction (import/export) 
9 Dialogs (the largest part) 

9 Support and helper functions 
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Outlook 


a the building-blocks are unit-tested and some real-world feedback from 
a handful of devs/artists/engineers. 

a take it to end-users (post Ardour3.0 release): “as soon as 3.0 is 
released, we’ll merge his work into the mainstream code.” (Paul D.) 

a video-server: (GNU/Linux) all dependencies are already in 

debian/stable - packaging the video-server is prepared - pending last 
minute changes 

a video-server: statically linked version for Win32 and OSX available. 
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ToDo 


® update to latest A3 internal API (in particular: audio export) 

® multi-track export and mux (5.1) 

® video-file "import” (hard-link, session-folder, keep orig and 
transcoded) 

® expose EDL support in Ul; ardour: AAF, MXF, BWF, EDL.. 

® messages (fix Inglisch :-) and translate) 
a user-definable video-export presets 

® optimizations (thread-pool for image requests, ..., HTTP Keep-Alive) 
® squash bugs: A3: 8k LOC ; icsd: 12k LOC 
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